On robust regression with high-dimensional predictors.

نویسندگان

  • Noureddine El Karoui
  • Derek Bean
  • Peter J Bickel
  • Chinghway Lim
  • Bin Yu
چکیده

We study regression M-estimates in the setting where p, the number of covariates, and n, the number of observations, are both large, but p ≤ n. We find an exact stochastic representation for the distribution of β = argmin(β∈ℝ(p)) Σ(i=1)(n) ρ(Y(i) - X(i')β) at fixed p and n under various assumptions on the objective function ρ and our statistical model. A scalar random variable whose deterministic limit rρ(κ) can be studied when p/n → κ > 0 plays a central role in this representation. We discover a nonlinear system of two deterministic equations that characterizes rρ(κ). Interestingly, the system shows that rρ(κ) depends on ρ through proximal mappings of ρ as well as various aspects of the statistical model underlying our study. Several surprising results emerge. In particular, we show that, when p/n is large enough, least squares becomes preferable to least absolute deviations for double-exponential errors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data

Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...

متن کامل

Robust Estimation in Linear Regression with Molticollinearity and Sparse Models

‎One of the factors affecting the statistical analysis of the data is the presence of outliers‎. ‎The methods which are not affected by the outliers are called robust methods‎. ‎Robust regression methods are robust estimation methods of regression model parameters in the presence of outliers‎. ‎Besides outliers‎, ‎the linear dependency of regressor variables‎, ‎which is called multicollinearity...

متن کامل

Sufficient Dimension Reduction With Missing Predictors

In high-dimensional data analysis, sufficient dimension reduction (SDR) methods are effective in reducing the predictor dimension, while retaining full regression information and imposing no parametric models. However, it is common in high-dimensional data that a subset of predictors may have missing observations. Existing SDR methods resort to the complete-case analysis by removing all the sub...

متن کامل

Mammalian Eye Gene Expression Using Support Vector Regression to Evaluate a Strategy for Detecting Human Eye Disease

Background and purpose: Machine learning is a class of modern and strong tools that can solve many important problems that nowadays humans may be faced with. Support vector regression (SVR) is a way to build a regression model which is an incredible member of the machine learning family. SVR has been proven to be an effective tool in real-value function estimation. As a supervised-learning appr...

متن کامل

A robust method for ultra-high dimensional regression analysis

To increase the estimation accuracy and reduce the computational cost in ultrahigh dimensional regression analysis, ? proposed Sure Independence Screening (SIS) which selects a subset of the variables before estimating the regression coefficients. Predictor variables are selected according to the magnitude of their marginal correlations with the response variable. ? proved that SIS shares the S...

متن کامل

Classification by ensembles from random partitions of high-dimensional data

A robust classification procedure is developed based on ensembles of classifiers, with each classifier constructed from a different set of predictors determined by a random partition of the entire set of predictors. The proposed methods combine the results of multiple classifiers to achieve a substantially improved prediction compared to the optimal single classifier. This approach is designed ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Proceedings of the National Academy of Sciences of the United States of America

دوره 110 36  شماره 

صفحات  -

تاریخ انتشار 2013